HEAPABLE SEQUENCES AND SUBSEQUENCES 
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Abstract. Let us call a sequence of numbers heapable if they can be sequentially inserted to 

form a binary tree with the heap property, where each insertion subsequent to the first occurs 

at a leaf of the tree, i.e. below a previously placed number. In this paper we consider a variety 

of problems related to heapable sequences and subsequences that do not appear to have been 

studied previously. Our motivation for introducing these concepts is two-fold. First, such problems 

__ correspond to natural extensions of the well-known secretary problem for hiring an organization 

^•^ with a hierarchical structure. Second, from a purely combinatorial perspective, our problems are 

f—^ interesting variations on similar longest increasing subsequence problems, a problem paradigm that 

p^ has led to many deep mathematical connections. 

, We provide several basic results. We obtain an efficient algorithm for determining the heapa- 

r^ bility of a sequence, and also prove that the question of whether a sequence can be arranged in a 

^"~5 complete binary heap is NP-hard. Regarding subsequences we show that, with high probability, 

■^1^ the longest heapable subsequence of a random permutation of n numbers has length (1 — o(l))n, 

,_^ and a subsequence of length (1 — o(l))n can in fact be found online with high probability. We 

similarly show that for a random permutation a subsequence that yields a complete heap of size an 
' ' for a constant a can be found with high probability. Our work highlights the interesting structure 

■^ underlying this class of subsequence problems, and we leave many further interesting variations 

^^ open for future work. 
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1. Introduction 

The study of longest increasing subsequences is a fundamental combinatorial problem, and such 
sequences have been the focus of hundreds of papers spanning decades. In this paper, we consider a 
natural, new variation on the theme. Our main question revolves around the problem of finding the 
longest heapable subsequence. Formal definitions are given in Section [2j but intuitively: a sequence 
is heapable if the elements can be sequentially placed one at a time to form a binary tree with 
the heap property, with the first element being placed at the root and every subsequent element 
being placed as the child of some previously placed element. For example, the sequence 1, 3, 5, 2, 4 
is heapble, but 1,5,3,2,4 is not. The longest heapable subsequence of a sequence then has the 
obvious meaning. (Recall that a subsequence need not be contiguous within the sequence.) 

Our original motivation for examining such problems stems from considering variations on the 
well-known secretary problem [U [6] where the hiring is not for a single employee but for an orga- 
nization. For example, Broder et al. j3] consider an online hiring rule where a new employee can 
only be hired if they are better than all previous employees according to some scoring or ranking 
mechanism. In this scenario, with low ranks being better, employees form a decreasing subsequence 
that is chosen online. They also consider rules such as a new employee must be better than the 
median current employee, and consider the corresponding growth rate of the organization. 

A setting considered in this paper corresponds to, arguably, a more realistic scenario where 
hiring is done in order to fill positions in a given organization chart, where we focus on the case of 
a complete binary tree. A node corresponds to the direct supervisor of its children, and we assume 
the following reasonable hiring restriction: a boss must have a higher rank than their reporting 
employeesjj A natural question is how to best hire in such a setting. Note that, in this case, our 
subsequence of hires is not only heapable, but the heap has a specific associated shape. As another 
variation our organization tree may not have a fixed shape, but must simply correspond to a binary 
tree with the heap property — at most two direct reports per boss, with the boss having a higher 
rank. 

We believe that even without this motivation, the combinatorial questions of heapable sequences 
and subsequences are compelling in their own right. Indeed, while the various hiring problems 
correspond to online versions of the problem, from a combinatorial standpoint, offline variations of 
the problem are worth studying as well. Once we open the door to this type of problem, there are 
many fundamental questions that can be asked, such as: 

• Is there an efficient algorithm for determining if a sequence is heapable? 

• Is there an efficient algorithm for finding the longest heapable subsequence? 

• What is the probability that a random permutation is heapable? 

• What is the expected length and size distribution of the longest heapable subsequence of a 
random permutation? 

We have answered some, but not all, of these questions, and have considered several others that we 
describe here. We view our paper as a first step that naturally leads to many questions that can 
be considered in future work. 

1.1. Overview of Results. We begin with heapable sequences, giving a natural greedy algorithm 
that decides whether a given sequence of length n is heapable using 0{n) ordered dictionary opera- 
tions. Unfortunately, when we place further restrictions on the shape of the heap, such as insisting 
on a complete binary tree, determining heapability becomes NP-hard. Our reduction involves gad- 
gets that force subsequences to be heaped into specific shapes which we exploit in delicate ways. 
However when the input sequence is restricted to 0-1 the problem again becomes tractable and 
we give a linear-time algorithm to solve it. This case corresponds naturally to the scenario where 

We do not claim that this always happens in the real world. 
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candidates are rated as either strong or weak and strong candidates will only work for other strong 
candidates (weak candidates are happy to work for whomever). 

Turning to heapable subsequences, we show that with high probability, the length of the longest 
heapable subsequence in a random permutation is (1 — o(l))n. This result also holds in the online 
setting where elements are drawn uniformly at random from the unit interval, or even when we only 
know the ranking of a candidate relative to the previous candidates. In the case when we restrict 
the shape of the tree to complete binary trees, we show that the longest heapable subsequence has 
length linear in n with high probability in both the offline and online settings. In all cases our results 
are constructive, so they provide natural hiring strategies in both the online and offline settings. 
Throughout the paper, we conduct Monte Carlo simulations to investigate scaling properties of 
heapable subsequences at a finer granularity than our current analyses enable. Finally, we discuss 
several attractive open problems. 

1.2. Previous Work. The problems we consider are naturally related to the well-known longest 
increasing subsequence problem. As there are hundreds of papers on this topic, we refer the reader 
to the excellent surveys [U [8] for background. 

We briefly summarize some of the important results in this area that we make use of in this paper. 
In what follows, we use LIS for longest increasing subsequence and LDS for longest decreasing 
subsequence. Among the most basic results is that every sequence of n^ + 1 distinct numbers has 
either an LIS or LDS of length at least n + 1 [HIH]. An elegant way to see this is by greedy patience 
sorting [T]. In greedy patience sorting, the number sequence, thought of as a sequence of cards, is 
sequentially placed into piles. The first card starts the leftmost pile. For each subsequent card, if it 
is larger than the top card on every pile, it is placed on a new pile to the right of all previous piles. 
Otherwise, the card is placed on the top of the leftmost pile for which the top card is larger than 
the current card. Each pile is a decreasing subsequence, while the number of piles is the length of 
the LIS - the LIS is clearly at most the number of piles, and since every card in a pile has some 
smaller card in the previous pile, the LIS is at least the number of piles as well. 

In the case of the LIS for a random permutation of n elements, it is known that the asymptotic 
expected length of the LIS grows as 2-y/n. More detail regarding the distribution and concentration 
results can be found in [2j. In the online setting, where one must choose whether to add an element 
and the goal is to obtain the longest possible increasing subsequence, there are effective strategies 
that obtain an asymptotic expected length of v2n. Both results also hold in the setting where 
instead of a random permutation, the sequence is a collection of independent, uniform random 
numbers from (0, 1). 

2. Definitions 

Let X = xi, . . . ,Xnhe a sequence of n real numbers. We say x is heapable if there exists a binary 
tree T with n nodes such that every node is labelled with exactly one element from the sequence 
X and for every non-root node Xj and its parent Xj, Xj < Xi and j < i. Notice that T serves as 
a witness for the heapability of x. We say that x is completely heapable if x is heapable and the 
solution T is a complete binary tree. 

If T is a binary tree with k nodes, then there are k+1 free slots in which to add a new number. We 
say that the value of a free child slot is the value of its parent, as this represents the minimum value 
that can be placed in the slot while preserving the heap property. Let sig{T) = (xi,X2, • . • ,Xk-{-i) 
be the values of the free slots of T in non-decreasing sorted order. We call sig{T) the signature of 
T. For example, heaping the sequence 1, 4, 2, 2 yields a tree with 5 slots and signature (2, 2, 2, 4, 4). 
Given two binary trees Ti and T2 of the same size k, we say that Ti dominates T2 if and only if 
sig{Ti)[i] < sig{T2)[i] for all 1 < i < A; where sig{T)[i] is the value of slot i of T. 

Now define the depth of a slot i in T to be be the depth of the parent node associated with slot i 
of T. We say that Ti and T2 have equivalent frontiers if and only there is a bijection between slots 
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of Ti and slots of T2 that preserves both value and depth of slots. A sequence is uniquely heapable 
if all valid solution trees for the sequence have equivalent frontiers. 

Given a sequence, we say a subsequence (which need not be contiguous) is heapable with the 
obvious meaning, namely that the subsequence is heapable when viewed as an ordered sequence. 
Hence we may talk about the longest heapable subsequence (LHS) of a sequence, and similarly the 
longest completely-heapable subsequence (LCHS). 

We also consider heapability problems on permutations. In this case, the input sequence is a 
permutation of the integers 1, . . . , n. For offline heapability problems, heaping an arbitrary sequence 
of n distinct real numbers is clearly equivalent to heaping the corresponding (i.e. rank-preserving) 
permutation of the first n integers. Here we assume the input sequence is drawn uniformly at 
random from the set all of n\ permutations on [l,n]. Several of our results show that given a 
random permutation x on [l,n] that the LHS or LCHS has length f{n) with high probability, i.e. 
with probability 1 — o(l). 

3. Heapable Sequences 

3.1. Heapability in polynomial time. In this section we give a simple greedy algorithm Greedy- 
SlG that decides whether a given input sequence is heapable using 0{n) ordered associative array 
operations, and explicitly constructs the heap when feasible. 

Greedy-Sig builds a binary heap for a sequence x = xi, . . . ,Xn by sequentially adding Xi as 
a child to the the tree Tj-i built in the previous iteration, if such an addition is feasible. The 
greedy insertion rule is to add Xi into the slot with the largest value smaller than or equal to Xi . To 
support efficient updates, Greedy-Sig also maintains the signature of the tree, sig{Ti), where each 
element in the signature points to its associated slot in Tj. Insertion of Xi therefore corresponds 
to first identifying the predecessor, pred(2;j), in sig{Ti^i) (if it does not exist, the sequence is not 
heapable). Next, Xi is inserted into the corresponding slot in Tj-i, coupled with deleting pred(2;j) 
from sig(Tj_i), and inserting two copies of Xj, the slots for Xj's children. Greedy-Sig starts with 
the tree Ti = xi and iterates until it exhausts x (in which case it returns T = Tn) or finds that 
the sequence is not heapable. Standard dictionary data structures supporting pred, insert and 
DELETE require O(logn) time per operation, but we can replace each number with its rank in the 
sequence, and use van Emde Boas trees [9] to index the signatures, yielding an improved bound of 
O(loglogn) time per operation, albeit in the word ram model. 

Theorem 1. x is heapable if and only i/GREEDY-SlG returns a solution tree T. 

Proof. Let Ti and T2 be binary trees, each with k leaves. Let y be a real number such that 
y ^ sig{T2)[\]. It is easy to see that the following claim holds. 

Claim 1. If sig[Ti) dominates sig{T2) then sig{T[) dominates sig{T2) where Tg is any valid tree 
created by adding y to T2 and T[ is the tree produced by greedily adding y to Ti. 

If Greedy-Sig returns a solution then by construction, x is heapable. For the converse, let 
X = xi, . . . ,Xn he a heapable sequence and let T* be a solution for x. Since T* is a witness for x, 
it defines a sequence of trees T^,T2, . . . ,T* = T* . It follows from Claim [l] that at each iteration, 
the greedy tree Ti strictly dominates T* , thus Greedy-Sig correctly returns a solution. D 

We used Greedy-Sig to compute the probability that a random permutation of n numbers is 
heapable as n varies. The results are displayed in Figure [3} 

3.2. Hardness of complete heapability. We now show that the problem of deciding whether 
a sequence is completely heapable is NP-complete. First, complete heapability is in NP since a 
witness for x is just the final tree, T, if one exists. To show hardness, we reduce from the NP-hard 
problem ExACT Cover by 3-Sets which, when given a set of n elements Y = {1, . . . ,n} and a 



HEAPABLE SEQUENCES AND SUBSEQUENCES 




A( 


x,k,h) : 
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4 
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for i ^ to {h - 1) do 

for j •(— fc • 2* down to 1 do 

print {x,i,j) 
end for 

end for 



Figure 1. A schematic of the heap that x forces. 
Since the prologue sequence oi , . . . , 07 and epilogue 
sequence ci, 02,03 are uniquely heapable, the com- 
plete heapability of x reduces to fitting the sequence 
b into the black area. 



Figure 2. An iterative defini- 
tion of A(x, k, h). 



collection of m subsets C = {Ci, . . . , Cm} such that each Ci CY and |Cj| =3, asks whether there 
exists an exact cover of y by C: a subset C" C C such that |C"| = n/3 and 



U Cj 



Y. 



3.2.1. Preliminaries. Without loss of generality, we use triples of real numbers in our reduction 
instead of a single real number and rely on lexicographic order for comparison. Our construction 
relies on the following set of claims that force subsequences of x = xi, . . . ,xt to be heaped into 
specific shapes. 

Claim 2. If Xi > Xj for all j > i then x is heapable only if Xi appears as a leaf in the heap. 

Proof. Any child of Xj must have a value Xj > Xj with j > i, a contradiction. D 

Claim 3. If x' = x'^,X2, . . . ,x^ is a decreasing subsequence of x then for all x[ and x'-, i 7^ j, x'- 
cannot appear in a subtree rooted at x[ (and vice-versa). 

Proof. Take such a subsequence and a pair x[ and x'-. x'- succeeds x^ in the input, so x[ cannot be 
a descendant of x'-. Also, x'- cannot be a descendant of x[ without violating the heap property. D 

We use claim [3] to create sequences that impose some shape on the heap. For example, consider 
the sequence u = (1, 0, 2), (1, 0, 1), (1, 1,4),..., (1, 1, 1), (1, 2, 8), ... , (1, 2, 1), which, when occurring 
after (1,0,0), must be heaped into two perfect binary subtrees of height 3. Since we generate 
sequences like u often in our reduction, we use A(x, k, h) to denote a sequence of values of length 
k{2^ — 1), all of the form (x,*,*), that can be heaped into k perfect binary trees of height h. 
Figure [2] gives an iterative definition of A whereby A(l, 2, 3) generates u. 

Claim 4. A sequence A(x, /c, h) spans initial width at least k, and consumes depth at most h. These 
bounds on width and depth are also simultaneously achievable. 
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Proof. The initial k values of A(x, k, h) (i = in Figure ^ are decreasing and by Claim ^ must 
therefore be placed at k distinct leaves of the heap. The longest increasing subsequence of A(x, k, h) 
is formed by choosing one element {x,i,*) for each i, and thus the deepest heapable subsequence 
of A(x, k, h) is h. To achieve these bounds tightly, simply store A{x, k, h) level-wise in a row of k 
free slots. D 

We also define T{x, k, h) to be the prefix of A(a;, k, h) that omits the final k terms, i.e. a sequence 
of length k{2 — 2) that can be heaped into k complete binary trees with k elements missing in the 
final level. We can now generalize Claim [3] as follows: 

Claim 5. If x' = Fi{si,ki,hi),F2{s2,k2,h2), ■ ■ ■ ,Ft{st,kt,ht) is a subsequence of x such that the 
sequence {si} is decreasing and such that Fi G {A, T} for all i, then for every x'- G Fi{si, ki, hi),x', G 
Fj{sj, kj, hj),i 7^ j, x'^ and x'- have no ancestor / descendant relationship. 

3.2.2. The Reduction. 

Theorem 2. Complete heapability is NP-Hard. 

Proof. Given an Exact Cover by 3-Sets instance (1", C) where |y|=n and |C|=m, we construct 
a sequence x = a, 6, c of length 2—1 where h is the height of the heap and x is partitioned into a 
prologue sequence a, a subset sequence 6, and an epilogue sequence c. 

Prologue sequence. The prologue sequence a consists of seven consecutive sequences a = ai, 02, 
as, 04, 05, ae, and a-j: 

ai: A(-3,l,/ii) aa: A(Z, 2Mi - 1, /i2 + 3) 03: A(-l, 1, /12) a4:A(y,M2,3) 

05: T{n - e, 1, 2), r((n - 1) - e, 1, 2), . . . , r(l - e, 1, 2) ag: r(0 - e, 3m - n, 2) ar- A(-2, f , 1) 

Epilogue sequence. Similarly the epilogue sequence is defined to be c = 01,02,03: 

ci: A(X, 8m, /i2 - 2) C2: A(n, 4, 1), A(n - 1, 4, 1), ... , A(l, 4, 1) C3: A(0.1, 6m - 2n, 1) 

Taken together, the prologue and epilogue sequences enforce the following key property. 

Claim 6. The prologue sequence a is uniquely heapable; moreover, if x is completely heapable, then 
the epilogue sequence c is uniquely heapable with respect to a and b. 

Proof. By Claim |4l the sequence ai forces a complete binary tree with A^i leaves. Call this tree T^^. 
Now consider the subsequence 02,03,07. Since the sequence Z,—l,—2 is decreasing, by Claim [5j 
these blocks have no ancestor/descendant relationships. Moreover, since values of 03 are strictly 
smaller than those of 02 and values of oy are strictly smaller than those of 02 . . .og, these three 
blocks must all be rooted at oi . Since 02 , 03 and 07 begin with decreasing subsequences of length 
2Mi — 1,1, and m/2 respectively, these values fill the 2 • (Mi + m/4) children of oi, and thus the 
remaining levels of 02 and 03 are forced, also by Claim H^ (see Figure [I|. 

Next consider the subsequence 04, 05, ag. At the time these values are inserted, attachment points 
are only available beneath 03, as 02 reached the bottom of the heap and remaining slots below oi 
are reserved for 07. Since the sequence y, n, n — 1, . . . , 1, is decreasing. Claim [5] ensures that the 
components of 04 through og lie side- by-side beneath 03. The construction of 05 forces n free slots 
at level /ii + /12 + 2 beneath parents of respective values n — e, n — 1 — e, . . . , 1 — e. The construction 
of ag forces 3m — n free slots at that same level beneath parents of values — e. The white area of 
Figure [l] depicts the final shape of a. 

As for the epilogue sequence, by Claim [2j the sequence C2, C3, as well as the final subsequence 
in ci must all be on the bottom row of the heap. This completely fills the bottom row of the heap 
(after a). Then by Claim ^ ci,C2 and C3 have no ancestor-descendant relationship, so the rest of 
ci forms a contiguous trapezoid of height /i2 — 2 with the top row having length 8m. The grey area 
of Figure [T] depicts the final shape of c. D 
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This property ensures that after uniquely heaping a we produce the specific shape depicted by 
the white area in Figure [TJ Then, given that sequence c is uniquely heapable with respect to a and 
b, c also produces a specific shape depicted by the shaded area in Figure [TJ Taken together, the 
prologue and epilogue force sequence b to be heaped into the black area of Figure [T} 

The height of the heap, h, is defined below. Without any loss of generality, we assume m is a 
multiple of 4 and, for convenience, define the following values 

hi = [log2(m/4 + 1)] iVi = 2^1 Ml = Ni- m/4. 
/i2 = riog2 3m/2] N2 = 2^2 M2 = iV2 - 3m/2. 

Finally, let h = hi + h2 + 3, K = 2^, L = K + 1, X = K + 2, Y = K + 2, Z = K + 3 and e he a 
small constant such that < e < 1. Z, Y, X, L and K are the 5 largest values appearing in the 
first position of any tuple in our sequence x. 

Consider Figure [T] again. Sandwiched between aj and the trapezoid formed by ci is room for m 
complete binary trees of depth 4. We call these the tree slots. A similar sandwich of Sm singleton 
slots is formed between 05, og on the top and C2, C3 on the bottom. More precisely, from the specific 
construction of a and c, there are 3m — n slack slots sandwiched between ag and C3 and there are 
n set cover slots sandwiched between 05 and C2 

Claim 7. Each slack slot can only accept some value in the range (0 — e,0.1), and each set cover 
slot with parent value i — e can only accept some value in the range (i — e, i.O). 

Proof. The values in C3 are strictly smaller than those in 05, so they must be placed below ag. 
Each resulting slack slot therefore has a parent — e and two children of value 0.1. Similarly C2 
is heapable below 05 if and only if each sequence A(i,4, 1) pairs off with and is heaped below the 
corresponding sequence T{i — e, 1, 2). D 

The centerpiece of our reduction, the subset sequence b, is comprised of m subsequences repre- 
senting the m subsets in C. For each subset Ci = {ui, Vi,Wi}, let Ui < Vi < Wi w.l.o.g. and let bi be 
the sequence of 18 values 

h = (-1, i, 0), (-1, i, 1), {K, i, 1), {K, i, 0), {ui,0, 0), {v„ 0, 0), {w„ 0, 0), 
A(0,l,2),(L,i,8),(L,i,7),...,(L,i,l) 

Now take b = bm, bm-i, . • • , 61. Claim [6] implies that if x is completely heapable then b must totally 
fit into the remaining free slots of the heap (i.e., the black area in Figure [l]). 

Claim 8. If x is completely heapable, then the m roots of the complete binary trees comprising the 
tree slots must be the initial (— l,i,0) values from each of the bi subsequences. 

Proof. Observe that the (—1, i, 0) values form a decreasing subsequence, and are too small for any 
of the singleton slots. They must therefore occupy space in the m complete binary trees. By 
Claim ^ they mutually have no ancestor/descendant relationship, and must be in separate trees. 
But as they are the m smallest values in b they must occupy the m roots of these trees. D 

Claim [8] implies that the values of each bi must be slotted into a single binary tree in the black 
area of Figure [l] as well as some singleton slots. The following claim shows that the values occupying 
the singleton slots correspond to choosing the entire subset Ci or not choosing it at all. 

Claim 9. If x is completely heapable, then each bi sequence fills exactly 15 tree slots from a single 
complete binary tree and exactly 3 singleton slots. Furthermore, the 3 singleton values are either 
the three values {ui, 0, 0), {vi, 0,0), {wi, 0,0) or the three values A(0, 1,2). 

Proof. By Claims [3^ and ^ the 8m decreasing L values must occupy level 4 (i.e. the final row 
of the black area in Figure [l]). For a given subsequence bi, Claim M implies that the suffix 
(L, i, 8), . . . , {L, i, 1) occupy the leaves of the binary tree rooted at (—1, i, 0). As a consequence, we 
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need to select a completely-heapable subsequence of length exactly 7 from the residual prefix of bi 
(prior to (L, i, 8)). 

First, note that the first four values of bi must be included, as they cannot be placed elsewhere 
in the heap. Moreover, the orientation of these four values is forced: since {K, i, 1) and {K, i, 0) 
can only be parents of nodes of the form (L, i, *), they must be placed at level two, with (—1, i, 1) 
as their parent at level one. 

Now consider {ui, 0, 0). If this value is included in the complete heapable subsequence, its location 
is forced to be the available child of the root (— l,i,0), and therefore both (t;j,0,0) and {wi, 0,0) 
must also be selected as its children (the zeroes in A(0, 1, 2) are too large to be eligible) to conclude 
the complete heapable subsequence. The three values of A(0,l,2) are necessarily exiled to slack 
slots in this case. Alternatively, if {ui,0,0) is not selected in the complete heapable subsequence, 
then the three nodes concluding the heapable subsequence must be A(0, 1, 2), since neither [vi, 0, 0) 
nor (wi, 0,0) has two eligible children in the considered prefix of bi. Therefore, the three values 
{ui,0, 0), {vi, 0, 0), {wi, 0, 0) are exiled to slack slots in this case. D 

The hardness result follows directly from the following lemma. 

Lemma 1. {Y,C) contains an exact cover iff x is completely heapable. 

Proof. For the if-direction, examine the complete heap produced by x. For each hi tree, use subset 
Ci as part of the exact cover if and only if that tree includes A(0, 1, 2) in its entirety. By Claim ^ 
the set values from d were all assigned to the set cover slots which we know enforces a set cover 
by Claim [Tj so the union of our n/3 subsets is an exact cover. 

For the only-if direction, for each subset Ci in the exact cover, heap the subset sequence bi so 
that {ui, 0,0), {vi,0,0), {wi, 0,0) occupy set cover slots and the remaining 15 values occupy tree 
slots. Taken together, these fill up the n set cover slots and n/3 of the complete binary trees. Heap 
the m — re/3 subset sequences not in the cover so as to exile triples of the form A(0, 1,2), filling 
up the 2>m — n slack slots and the remaining m — re/3 complete binary trees. Since the epilogue c 
perfectly seals the frontier created by b, x is completely heapable. D 

D 

3.3. A linear-time algorithm for complete heapability of 0-1 sequences. When we restrict 
the problem of complete heapability to 0-1 values, the problem becomes tractable. The basic 
idea is that any completely heapable sequence of 0-1 values can be heaped into a canonical shape 
dependent only upon the number of Is appearing in the sequence. After counting the number of 
Is, we attempt to heap the sequence into the shape. If it fails, the sequence is not heapable. 

Without loss of generality, let x be a sequence of re = 2^^ — 1 0-1 values since we can always 
pad the end of x with Is without affecting its complete heapability. With 0-1 sequences, once a 1 
is placed in the tree, only Is may appear below it. Thus, in any valid solution tree T for x, the 
nodes labelled with 1 form a forest J-{T) of perfect binary trees. Let V{T) be the set of nodes of 
T that are labeled with and fall on a path from the root of T to the root of a tree in T{T) . Note 
that the nodes in V{T) form a binary tree. Let yi, . . . ,yr be the nodes of V(T) in the order they 
appear in x. If yi is a non-full node in V{T) then let a{yi) be the number of nodes appearing in 
the perfect trees of Is of which y, is the parent. If yi is a full node then let a{yi) = 0. Now let 
/3(yj) = a{yi) + f3{yi-i) where /3(yi) = a{yi). The values /3(yi), . . . , /3(yr) represent the cumulative 
number of Is that the first i nodes in V{T) can absorb from T{T). That is, after inserting yi, . . . ,yi, 
we can add at most (3{yi) of the Is appearing in F{T). 

Suppose X has m Is in total and let T* be a perfect binary tree of height k where the first m 
nodes visited in a post-order traversal of T* are labelled 1 and the remainder of nodes are labelled 
0. Note that the nodes labelled with 1 in T* form a forest T{T*) = Tj*,r|, . . . ,T* of z perfect 
binary trees in descending order by height. Let ui, . . . , f^ be the nodes of T{T*) given by sequential 
pre-order traversals of Tj* , r| , . . . , T^ . Let ui, . . . ,Us be the nodes given by a pre-order traversal of 
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V{T*). We build T* so that the first s Os appearing in x are assigned sequentiahy to ui, ... ,Us and 
the m Is appearing in x are assigned sequentially to fi, . . . , Vm- 

Lemma 2. x is completely heapable if and only ifT* is a valid solution for x. 

Proof. It's clear that if T* is a valid solution for x then, by definition x is completely heapable. 
Now, suppose X is completely heapable. Then there exists a valid solution tree T. We show that 
whenever a 1 is added to T, we can also add a 1 to T*. It should be clear that whenever a is 
added to T we can add a to T*. 

Let yi, . . . ,yr be the nodes of V{T) in the order they appear in x. Note that s < r. This follows 
because T{T*) has the fewest number of binary trees in any valid solution for x. One way to 
see this is by imagining each perfect tree of Is as corresponding to one of the 2* — 1 terms in the 
(unique) polynomial decomposition of m into m = ai{2^ — 1) + ai_i{2^^^ — !) + •• - + 01(2^ — 1) where 
each coefficient Oj is either or 1 except for the final non-zero coefficient which may be 2. This 
is essentially an "off-by-one" binary representation of m. Thus, the perfect trees in T{T*) have 
strictly decreasing heights except for, potentially, the shortest two trees which may have identical 
heights. It's clear that assigning the Os in this order makes the largest number of 1 slots available 
as quickly as possible in any valid solution tree. Thus, for for 1 < j < s we have (i{uj) < f3{yj). 
Therefore, anytime a 1 is placed in T, we can place a 1 in T*. D 

We're now prepared to prove the main theorem of this section. 

Algorithm 1 Complete-Heap (x) where x is a sequence of n = 2 — 1 0-1 values 

1: T* -^ perfect binary tree with n nodes ui, . . . ,Un 

2: m -^r- number of Is in x 

3: Q •(— empty queue 

4: T{T*) = {Tj*, . . . , T*} -^ a forest of z trees given by the first m nodes in a post-order traver- 
sal of T* and ordered by height 

5: for i ^ 1 to z do 

6: Qi ■^ a, queue of nodes given by a pre-order traversal of T* 

7: end for 

8: Qq ■^ a. queue oi n — m nodes given by a pre-order traversal of T* — T{T*) 

9: for i ^ 1 to n do 
10: if Xj = then 
11: U ^ DEQUEUE((5o) 

12: if u is the parent of some tree T* in F{T*) then 

13: dequeue the elements from Qi and enqueue them into Q 

14: end if 

15: else 

16: U <r- DEQUEUE(Q) 

17: end if 

18: if u = NIL then 

19: return "NOT HEAPABLE" 

20: else 

21: assign Xi to u 

22: end if 

23: end for 

24: return T* 

Theorem 3. Complete heapability of sequences of 0-1 values is decidable in linear time. 
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Proof. Algorithm [T] provides a definition of Complete-Heap whicli we use to decide in linear time 
if X is completely heapable. Initially, we build an unlabeled perfect binary tree of height k. We also 
count the number of Is appearing in x. Both these operations take linear time. Next we identify 
where and in what order the Is should be assigned and build a queue of nodes Qi for each tree 
T^ G T{T*). These operations take linear time in total since we can build the T* in one post-order 
traversal of T* and each Qi can be built from a single pre-order traversal of T* . We also identify 
where and in what order the Os should be assigned to T* and enqueue these nodes in Qq. 

Now we simply try and assign each value in x to the appropriate node in T* if it is available. The 
idea is that once the parent of tree T* gets labeled with a 0, then the nodes in Qi are available for 
assignment. We can mark these parent nodes ahead of time to ensure our algorithm runs in linear 
time. If Q ever runs dry of nodes, then we don't have enough Os to build the frontier necessary 
to handle all the Is, so x is not completely heapable. On the other hand, if we terminate without 
exhausting Q, then the sequence is completely heapable. The correctness of the algorithm follows 
immediately from Lemma [2} D 

4. Heapable Subsequences 

In this section, we focus on the case where the sequence corresponds to a random permutation. 
There are three standard models in this setting. In the first, the sequence is known to be a 
permutation of the numbers from 1 to n, and each element is a corresponding integer. Let us call 
this the permutation model. In the second case, the sequence is again known to be a permutation 
of [l,n], but when an element arrives one is given only its ranking relative to previous items. Let 
us call this the relative ranking model. In the third, the sequence consists of independent uniform 
random variables on (0, 1). Let us call this the uniform model. All three models are equivalent 
in the offline setting, but they differ in the online setting, where the relative ranking model is the 
most difficult. 

We ffist show that the longest heapable subsequence in any of these models, has length (1 — o(l))n 
with high probability, and in fact such subsequences can even be found online. For simplicity we 
first consider the offline case for the uniform model. We then show how to extend it to the online 
setting and to the relative ranking model. (As the permutation model is easier, the result follows 
readily for that model as well.) We note that we have not attempted to optimize the o(l) term. 
Finding more detailed information regarding the distribution of the LHS in these various settings 
is an open problem. 

Theorem 4. In the uniform model, the longest heapable subsequence has length (1 — o(l))n with 
high probability. 

Proof. We break the proof into two stages. We ffist show that we can obtain an LHS of length 
i7(n) with high probability. We then bootstrap this result to obtain the theorem. 

Let Ai be the subsequence consisting of the elements with scores less than 1/2 in the ffist n/2 
elements. With high probability the longest increasing subsequence of Ai is of length 0,{y/n). 
Organize the elements from the LIS of Ai into a heap, with F = Q{^/n) leaf nodes. 

Now let A2 be the subsequence consisting of the elements with scores greater than 1/2 in the 
last n/2 elements. Starting with the heap obtained from Ai, we perform the greedy algorithm for 
the elements of A2 until the first time we cannot place an element. Our claim is that with high 
probability a linear number of elements are placed before this occurs. Consider the F subheaps, 
ordered by their root element in decreasing order. In order not to be able to place an element, 
we claim that we have seen a decreasing subsequence of F elements in A2. This follows from the 
same argument regarding the length of the LIS derived from patience sorting. Specifically, each 
time an element was placed on a subheap other than the first, there must be a corresponding larger 
element placed previously on the previous subheap. Hence, when we cannot place an element, we 
have placed at least one element on each subheap, leading to a chain corresponding to a decreasing 
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Figure 3. The probability that a random 
permutation of n numbers is heapable as n 
varies. For values of n up to 10 the probabil- 
ities are exact; for larger values of n they are 
estimated from a set of 10! ~ 4 * 10^ sample 
permutations. 



Figure 4. The size of the heap found using 
the algorithm described in Theorem |4j as well 
as the joint length of subsequences Bi and B2, 
both with respect to the length of the input 
sequence n. 



subsequence of F elements. As F = Q{^/n), with high probability such a subsequence does not 
appear until after successfully placing Q{n) elements of ^2- 

Given this result, we now prove the main result. Let Bi be the subsequence consisting of the 
elements less than n~^'^ in the first n^' ^ elements. With high probability there are Q{n^''^) elements 
in Bi using standard Chernoff bounds, and hence by the previous paragraphs we can find an LHS 
of Bi of size J7(n^'^). Now let B2 be the subsequence consisting of the elements greater than n^^'^ 
in the remaining n — "nJ'^ elements. We proceed as before, performing the greedy algorithm for the 
elements of B2 until the first time we cannot place an element. For the process to terminate before 
all elements of B2 having been placed, B2 would have to have an LDS of length il(n^'^), which 
does not occur with high probability. D 

We implemented the algorithm described in Theorem |4] and applied it to a range of sequences of 
increasing size. Figure E^ displays the size of the resulting heap (averaged over 1000 iterations for 
each value of n) relative to the length of the original sequence, n. 

The proof extends to the online case. 

4.1. The case of random permutations. 

Corollary 1. In the uniform model, a heapable subsequence of length (1 — o(l))n can be found 
online with high probability. 

Proof. We use the fact that there are online algorithms that can obtain increasing subsequences of 
length r2(-^/n) in random permutations of length n [7]. Using such an algorithm on Ai as above 
gives us an appropriate starting point for using the greedy algorithm, which already works in an 
online fashion, on A2, to find an increasing subsequence of length J7(n) with high probability. We 
can then similarly extend the proof as in Theorem H^ to a sequence of length (1 — o(l))n using the 
subsequences Bi and B2 similarly. D 

There are various ways extend these results to the relative ranking model. For the offline problem, 
we can treat the first en elements as a guide for any constant e > 0; after seeing the first en elements, 
perform the algorithm for the uniform model for the remaining (1— e)n elements, treating an element 
as having a score less than 1/2 if it is ranked higher than half of the initial en elements and greater 
than 1/2 otherwise. The small deviations of the median of the sample from the true median will not 
affect the asymptotics of the end result. Then, as in Theorem [4j bootstrap to obtain an algorithm 
that finds a sequence of length (1 — o(l))n. 
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For the online problem, we are not aware of results giving bounds on the length of the longest 
increasing (or decreasing) subsequence when only relative rankings are given, although it is not 
difficult to obtain an Q{y/n) high probability bound given previous results. For example, one could 
similarly use the above approach, using the first en elements as a guide to assign approximate (0, 1) 
values to remaining elements, and then use a variation of the argument of Davis (presented in 
[^ [Section 7] ) to obtain a longest increasing subsequence on the first half of the remaining elements 
of size Q{^/n). 

We describe a more direct variation. Order the first en elements, and split the lower half of them 
by rank into y/n subintervals. Now consider next (1 — e)n/2 elements. Split them, sequentially, into 
^/n subgroups; if the ith subgroup of elements contains an element that falls in the ith subinterval, 
put it in our longest increasing subsequence. Note that this can be done online, and for each 
subinterval the probability of obtaining an element is a constant. Hence the expected size of the 
longest increasing subsequence obtained this way is i}{^/n), and a standard martingale argument 
can be used to show that in fact this holds with high probability. Then, as before we can show 
that in the next (1 — e)n/2 elements, we add J7(n) elements to our heap with high probability using 
the greedy algorithm. As before, this gives the first part of our argument, which can again be 
bootstrapped. 

Corollary 2. In the relative ranking model, a heapable subsequence of length (1 — o(l))n can be 
found both offline and online with high probability. 

We now turn our attention to the problem of finding the longest completely heapable subse- 
quence in the uniform and relative ranking models, as well as the associated online problems. For 
convenience we start with finding completely heapable subsequences online in the uniform model, 
and show that we can obtain sequence of length i}(n) with high probability. Our approach here 
is a general technique we call banding; for the ith level of the tree, we only accept values within a 
band (oj, bi). We chose values so that ai < bi = 02 < b2 = a^ . . ., that the bands are disjoint and 
naturally yield the heap property. Obviously this gives that the LCHS is il.{n) with high probability 
as well. We note no effort has been made to optimize the leading constant in the fi(n) term in the 
proof below. 

Theorem 5. In the uniform model, a completely heapable subsequence of length n{n) can be found 
online with high probability. 

Proof. As previously, we can find an LIS of size $7(-y/n) online within the first n/2 elements restricted 
to those with value less than 1/2. This will give the first (logn)/2 — ci levels of our heap, for some 
constant Ci. 

We now use the banding approach, filling subsequent levels sequentially. Suppose from the LIS 
that our bottom level has to nodes. Consider the next ui elements, and for the next level use a 
band of size vi, which in this case corresponds to the range (1/2,1/2 + vi). We need ti = 2to 
elements to fill the next level. Note that if we choose for example uivi = 2ti = 4to, we will be 
safe, in that Chernoff bounds guarantee we obtain enough elements to fill the next level. We let 
ui = 2-y/ton-^'^ and vi = ui/n. 

For each subsequent level we will need twice as many items, so generalizing for the ith level after 
the base we have ti = 2Ho, and we can can consider the next Ui = {\^y~^^^/ton^'^ elements using 
a band range of size Vi = Ui/n. We continue this for L levels. As long as ^j=i ^^i < n/2 and 
X]j=i ^1 ^ 1/2, the banding process can fill up to the Lth level with high probability. As the sums 
are geometric series, it is easy to check that we can take L = (logn)/2 — C2 for some constant C2 
(which will depend on to)- This gives the result, and the resulting tree now has logn — ci — C2 
levels, corresponding to Cl{n) nodes. D 

We implemented the algorithm described in Theorem [5] and applied it to a range of sequences 
of increasing size. For each sequence size. Figure [5] displays the average number of levels in the 
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Figure 5. The number of levels in a per- 
fect heap constructed using the algorithm de- 
scribed in Theorem [5] as n varies. Note the 
logarithmic scale of the x— axis. 
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Figure 6. An illustration of Theorem |6| for 
n = 32. The elements are ordered left-to-right, 
top-to-bottom. For example 8 precedes 7 and 1 
precedes 16. A representative longest increas- 
ing heapable subsequence is highlighted. 



resulting perfect heap. We verify that the number of elements of the resulting heap grows linearly 
to the length of the original sequence, as expected. 

We can similarly extend this proof to the relative ranking case. As before, using the first en 
elements as guides by splitting the lower half of these elements into ^/n regions, we can obtain an 
increasing sequence of size Q{^/n) to provide the first (log„)/2 — ci levels of the heap. We then use 
the banding approach, but instead base the bands on upper half of first en elements in the natural 
way. That is, we follow the same banding approach as in the uniform model, except when the band 
range is {a, /3) in the uniform model, we take elements with rankings that fall between the [aen] th 
and [/3enjth of the first en elements. It is straightforward to show that with high probability this 
suffices to successfully fill an additional (logn)/2 — C2 levels, again given a completely heapable 
subsequence of length $7(n). 

Again, for all of these variations, the question of finding exact assymptotics or distributions of 
the various quantities provides interesting open problems. 

4.2. Longest increasing and decreasing heapable subsequences. Because the longest hea- 
pable subsequence problem is a natural variation of the longest increasing subsequence problem, 
and the latter has given rise to many interesting combinatorial problems and mathematical connec- 
tions, we expect that the introduction of these ideas will lead to many interesting problems worth 
studying. For example, as we have mentioned, one of the early results in the study of increasing 
subsequences, due to Erdos and Szekeres, is that every sequence of n^ + 1 distinct numbers has 
either an increasing or decreasing subsequence of length n + 1 [4J . One could similarly ask about the 
longest increasing or decreasing heapable subsequence within a sequence. We have the following 
simple upper bound; we do not know whether it is tight. 

Theorem 6. There are sequences of n elements such that the longest increasing or decreasing 
heapable subsequence is upper bounded by 0(n/ log n). 

Proof. In fact we can show something stronger; there are sequences such that the longest increasing 
heapable subsequence and the longest decreasing subsequence have length 0(n/ log n). Consider 
the following construction: we begin by splitting the sequence of n elements into B equally sized 
blocks. Each block is a decreasing subsequence, and the subsequences are in increasing order, as 
illustrated in Figure |6J It can be easily seen that the longest decreasing subsequence has length 
n/B. For the longest increasing heapable subsequence, note that our optimal choice is to take 
one element from the first block, two from the next block, and so on so forth. We want to select 
an appropriate value for B so that the last block is the last full level of our increasing heap. The 
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number of heap elements is then 2^ — 1. Setting 2^ — 1 and n/B equal we have B{2^ — 1) = n, which 
for large n is approximated by B2 = n. Recall that the solution to this equation is B = W{n) 
where W is the Lambert W function. The latter has no closed form but a reasonable approximation 
is logn — log log n, so asymptotically we can arrange a bound of 0(n/logn). D 

5. Open Problems 

Besides finding tight bounds for the problem in the previous section, there are several other 
interesting open questions we have left for further research. 

• Is there an efficient algorithm for finding the longest heapable subsequence, or is it also 
NP-hard? If it is hard, are there good approximations? 

• For binary alphabets, we have shown complete heapability can be decided in linear time, 
while for permutations on n elements, the problem is NP-hard. What is the complexity for 
intermediate alphabet sizes? 

• What is the probability that a random permutation is heapable - either exactly, or asymp- 
totically? 

• Can we find the exact expected length or the size distribution of the longest heapable 
subsequence of a random permutation? The longest completely-heapable subsequence? 

• The survey of Aldous and Diaconis [Ij for LIS shows several interesting connections between 
that problem and patience sorting. Young tableaux, and Hammersley's interacting particle 
system. Can we make similar connections to these or other problems to gain insight into 
the LHS of sequences? 

We expect several other combinatorial variations to arise. 

There are also many open problems relating to our original motivation: viewing this process as 
a variation of the hiring problem. For example, we can consider the quality of a hiring process 
as corresponding to some function of the ranking or scores of the people hired, as in [3j . Here we 
have focused primarily on questions of maximizing the length of the sequence, or equivalently the 
number of people hired. More general reward functions, such as penalizing unfilled positions or 
allowing for errors such as an employee being more qualified than their boss in the hierarchy tree, 
seem worthy of further exploration. 
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